Goto

Collaborating Authors

 image-only model


MRI Plane Orientation Detection using a Context-Aware 2.5D Model

arXiv.org Artificial Intelligence

Humans can easily identify anatomical planes (axial, coronal, and sagittal) on a 2D MRI slice, but automated systems struggle with this task. Missing plane orientation metadata can complicate analysis, increase domain shift when merging heterogeneous datasets, and reduce accuracy of diagnostic classifiers. This study develops a classifier that accurately generates plane orientation metadata. We adopt a 2.5D context-aware model that leverages multi-slice information to avoid ambiguity from isolated slices and enable robust feature learning. We train the 2.5D model on both 3D slice sequences and static 2D images. While our 2D reference model achieves 98.74% accuracy, our 2.5D method raises this to 99.49%, reducing errors by 60%, highlighting the importance of 2.5D context. We validate the utility of our generated metadata in a brain tumor detection task. A gated strategy selectively uses metadata-enhanced predictions based on uncertainty scores, boosting accuracy from 97.0% with an image-only model to 98.0%, reducing misdiagnoses by 33.3%. We integrate our plane orientation model into an interactive web application and provide it open-source.


Conformal Prediction for Multimodal Regression

arXiv.org Artificial Intelligence

This paper introduces multimodal conformal regression. Traditionally confined to scenarios with solely numerical input features, conformal prediction is now extended to multimodal contexts through our methodology, which harnesses internal features from complex neural network architectures processing images and unstructured text. Our findings highlight the potential for internal neural network features, extracted from convergence points where multimodal information is combined, to be used by conformal prediction to construct prediction intervals (PIs). This capability paves new paths for deploying conformal prediction in domains abundant with multimodal data, enabling a broader range of problems to benefit from guaranteed distribution-free uncertainty quantification.


Modality-Balanced Models for Visual Dialogue

arXiv.org Artificial Intelligence

The Visual Dialog task requires a model to exploit both image and conversational context information to generate the next response to the dialogue. However, via manual analysis, we find that a large number of conversational questions can be answered by only looking at the image without any access to the context history, while others still need the conversation context to predict the correct answers. We demonstrate that due to this reason, previous joint-modality (history and image) models over-rely on and are more prone to memorizing the dialogue history (e.g., by extracting certain keywords or patterns in the context information), whereas image-only models are more generalizable (because they cannot memorize or extract keywords from history) and perform substantially better at the primary normalized discounted cumulative gain (NDCG) task metric which allows multiple correct answers. Hence, this observation encourages us to explicitly maintain two models, i.e., an image-only model and an image-history joint model, and combine their complementary abilities for a more balanced multimodal model. We present multiple methods for this integration of the two models, via ensemble and consensus dropout fusion with shared parameters. Empirically, our models achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and high balance across metrics), and substantially outperform the winner of the Visual Dialog challenge 2018 on most metrics.


Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

arXiv.org Machine Learning

This paper makes several contributions. Among these, only 20-40% yield a diagnosis of cancer (5). The authors declare no conflict of interest. To whom correspondence should be addressed. Work done while visiting NYU. In the reader study, we compared the performance of our best model to that of radiologists and found our model to be as accurate as radiologists both in terms of area under ROC curve (AUC) and area under precision-recall curve (PRAUC). We also found that a hybrid model, taking the average of the probabilities of malignancy predicted by a radiologist and by our neural network, yields more accurate predictions than either of the two separately. This suggests that our network and radiologists learned different aspects of the task and that our model could be effective as a tool providing radiologists a second reader. With this contribution, research groups that are working on improving screening mammography, which may not have access to a large training dataset like ours, will be able to directly use our model in their research or to use our pretrained weights as an initialization to train models with less data. By making our models public, we invite other groups to validate our results and test their robustness to shifts in the data distribution. The dataset includes 229,426 digital screening mammography exams (1,001,093 images) from 141,473 patients. For each breast, we assign two binary labels: from biopsies. We have 5,832 exams with at least one biopsy the absence/presence of malignant findings in a breast, performed within 120 days of the screening mammogram. With Among these, biopsies confirmed malignant findings for 985 left and right breasts, each exam has a total of four binary (8.4%) breasts and benign findings for 5,556 (47.6%) breasts.